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Abstract 

An error correcting code using a tree-like multilayer perceptron is proposed. An original message 
is encoded into a codeword Uq using a tree-like committee machine (committee tree) or a tree-like 
parity machine (parity tree). Based on these architectures, several schemes featuring monotonic or 
non-monotonic units are introduced. The codeword Dq is then transmitted via a Binary Asymmetric 
Channel (BAG) where it is corrupted by noise. The analytical performance of these schemes is 
investigated using the replica method of statistical mechanics. Under some specific conditions, 
some of the proposed schemes are shown to saturate the Shannon bound at the infinite codeword 
length limit. The influence of the monotonicity of the units on the performance is also discussed. 
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I. INTRODUCTION 



Reliability in communication has always been a major concern when dealing with digital 
data. Especially in today's information-dependent society, it is vital to design efficient ways 
of preventing data corruption when transmitting information. Error correcting codes have 
been developed for this purpose since the birth of the information theory field following the 
work of Shannon 

In 1989, Sourlas derived a set of error correcting codes, the so called Sourlas codes, 
which theoretically saturate the Shannon bound [2]. Although these codes turned out to be 
impractical, the main point of interest of this paper was the parallel made between physical 
spin glass systems and information theory. 

Following this paper, the tools of statistical mechanics have been successfully applied to a 



' error correcting 
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wide range of problems of information theory in recent years. In the field o:' 
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codes itself [3H5[, as well as in spreading codes p,l7i], and compression codes 
mechanical techniques have shown great potential. 

The present paper uses similar techniques to investigate an error correcting code scheme 
where the codeword is encoded using tree-like multilayer perceptron neural networks. It 
is known that there exists a natural duality between lossy compression codes and error 
correcting codes. Indeed, a lossy compression code can be regarded as a standard error 
correcting code, but one where the codeword is generated using the original decoder of the 
error correcting code scheme and where the decompressed message is obtained using the 
original encoder of the scheme (Cf. for details). 

Recently, a lossy compression scheme based on a simple perceptron decoder was inves- 
tigated by Hosaka et al. jlO|. In their paper, they used statistical mechanical techniques 



to investigate the theoretical performance of their scheme at the infinite codeword length 
limit. The perceptron they defined in their model uses a special hat-shaped non-monotonic 
transfer function. This rather uncommon feature enables the scheme to deal with biased 
mown that this type of function maximizes the storage capacity of the 
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simple perceptron 15|, |l6| . They found that their scheme can theoretically yield Shannon 
optimal performance. Subsequently, Shinzato et al. investigated the same model but in 
the framework of error correcting code. They found that their model can theoretically yield 
Shannon optimal performance. 
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Based on these studies, Mimura et al. 



12| proposed a tree-like multilayer perceptron 



network for lossy compression purposes, but use only the standard sign function as the 
transfer function of their model. They showed that the parity tree model can theoretically 
yield Shannon optimal performance, but only when considering unbiased messages. In con- 
trast, they showed that the committee tree model cannot yield optimal performance, even 
for unbiased messages. However, the advantage of using a multilayer structure is improved 
replica symmetric solution stability, and an increased number of codewords sharing the same 
distortion properties [l8| . In a recent study, Cousseau et al. [l3| investigated the same tree- 
like multilayer perceptron model but used the hat-shaped non-monotonic transfer function 

n nn 

introduced by Hosaka et al. [10[, thus combining both advantages of [10|, 1121 • By doing so, 
they were able to show that both parity tree and committee tree structures can then theo- 
retically yield Shannon optimal performance even for biased messages under some specific 
conditions. 

The purpose of the present paper is to discuss the performance of the same tree-like 
perceptron models but in the error correcting code framework, thus completing the topic 
of perceptron type network applications in coding theory. In this paper, we make use of 
the Binary Asymmetric Channel (BAG). Indeed, the use of the non-monotonic hat-shaped 
transfer function introduced by Hosaka et al. [lO(| enables us to control the bias of the of 
the codeword sequence, and enables the relevant schemes to deal with such an asymmetric 
channel (the BAG was also used by Shinzato et al. On the other hand, we expect the 

schemes which use the standard monotonic sign function to be able to deal only with the BSG 
channel, which corresponds to a particular case of the BAG. The majority of popular error 



correcting codes like turbo codes [19| and low density parity check codes (LDPG) |20|, l21| . 
which provide near Shannon performance in practical time frames, have been widely studied 
but this was generally restricted to symmetric channels. On the other hand, apart from a 



few studies 
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23| . little is known when dealing with asymmetric channels. 
Multilayer perceptrons have been widely studied over the years by the machine learning 
community and a wide range of problems have been considered (storage capacity, learning 
rules, etc). These works revealed non-trivial behaviors of even simple models like the simple 
perceptron network for example. Many of these previous results are summarized in reference 



24j |. The present analysis gives us an opportunity to discuss the difficulty of decoding for 



densely connected systems (or dense systems as opposed to sparsely connected systems like 
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LDPC codes for example) using a systematic manner in the context of multilayer networks. 
There has been relatively little discussion of dense systems, mainly because of the compu- 
tational cost which is obviously higher than for sparse systems. However, because of their 
their rich randomness, dense systems can possibly be regarded as pseudo-random codes like 
the dense limit of LDPC codes. 

In this paper we mainly focus on the necessary conditions to get Shannon optimal per- 
formance. To discuss practical decoders, it is first necessary to investigate the optimality of 
our schemes. This includes discussion of the optimal parameters for the transfer function 
since we need to know these parameters to discuss the optimal decoder. In other words, we 
need a theoretical analysis of the performance before we can study the decoding problem. 

The paper is organized as follows. Section [TTl introduces the framework of error correcting 
codes. Section UTIl describes our model. Section |IV] deals with the BAG capacity. Section 
IVl presents the mathematical tools used to evaluate the performance of the present scheme. 
Section I VI I states the results and elucidates the location of the phase transition, which 
characterizes the best achievable performance of the model. Section IVIII is devoted to the 
conclusion and discussion. 

II. ERROR CORRECTING CODES 

In a general scheme, an original message of size is encoded into a codeword 
of size M by some encoding device. The aim of this stage is to add redundancy to the 
original data. Therefore, we necessarily have M > N. Based on this redundancy, a proper 
decoder device should be able to recover the original data even if it were corrupted by noise 
in the transmission channel. The quantity R = N/M is called the code rate and evaluates 
the trade-off between redundancy and codeword size. The codeword is then fed into a 
channel where the bits are subject to noise. The received noisy message y (which is also 
M dimensional) is then decoded using its redundancy to infer the original N dimensional 
message In other words, in a Bayesian framework, one tries to maximize the following 
posterior probability. 



As data transmission is costly, generally one wants to be able to ensure error-free trans- 




(1) 
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mission while transmitting the fewest possible bits. In other words, one wants to ensure 
error-free transmission while keeping the code rate as large as possible. For this purpose, 
the well known Shannon bound gives a way to compute the best achievable code rate 
which allows error-free recovery. However, while this gives us the value of such an optimal 
code rate, it does not give any clue as to how to construct such an optimal code. Therefore, 
several codes have been proposed over the years in an ongoing quest to find a code which 
can reach this theoretical bound. 



III. ERROR CORRECTING CODES USING MONOTONIC AND NON- 
MONOTONIC MULTILAYER PERCEPTRONS 

In this paper, since we make use of techniques derived from statistical mechanics, we will 
use Ising variables rather than Boolean ones. The Boolean is mapped onto 1 in the Ising 
framework while the Boolean 1 is mapped to —1. This mapping can be used without any 
loss of generality. 

We assume that the original message Sq is generated from the uniform distribution and 
that all the bits are independently generated so that we have 

Pi^') = ^- (2) 

The channel considered in this study is the Binary Asymmetric Channel (BAG) where each 
bit is flipped independently of the others with asymmetric probabilities. If the original bit 
fed into the channel is 1, then it is flipped with probability p. Conversely, if the original bit 
is —1, it is flipped with probability r. Figure [H shows the BAC properties in details. The 




FIG. 1: The Binary Asymmetric Channel (BAC) 

well known Binary Symmetric Channel (BSC) corresponds to the particular case r = p. 

When the corrupted message y is received at the output of the channel, the goal is then 
to recover s° using y. The state of the estimated message is denoted by the vector s. The 
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general outline of the scheme is shown in Figure [2l From Figure [T] we can easily derive the 
following conditional probability, 



PivV) = 2 + y[(i - ^ - p)yo + - p)l 



(3) 



where we make use of the notations Uq = (?/q, . . . , i/q, . . . , i/q^), y = {y^, . . . , y^, . . . , y*^). 
Since we assume that the bits are flipped independently, we deduce 

pivivo) = n 2 + yf^^ " " " p^y' + - ^)] • (4) 

To encode the original message into a codeword y^, we use three non-monotonic 
tree-like parity machine or committee machine neural networks ((I), (II) and (III)). In the 
same way, we also investigate the standard monotonic parity tree and committee tree neural 
networks ((IV) and (V)). 

(I) Multilayer parity tree with non-monotonic hidden units (PTH). 

K 



1=1 




(5) 



(II) Multilayer committee tree with non-monotonic hidden units (CTH). 

K 



, 1=1 



A* 



(6) 



Note that in this case, if the number of hidden units K is even, it is possible to get as the 
argument of the sign function. We avoid this uncertainty by considering only an odd number 
of hidden units for the committee tree with non-monotonic hidden units in the sequel. 
(Ill) Multilayer committee tree with a non-monotonic output unit (CTO). 

K 

K ^ V 



=1 



M 



(7) 
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FIG. 2: Layout of the scheme 
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(IV) Multilayer parity tree (PT). 



1=1 



K 




) 



(8) 



(V) Multilayer committee tree (CT). 




) 



(9) 



In this case also, if the number of hidden units K is even, it is a possible to get as the 
argument of the sign function. We again avoid this uncertainty by considering only an odd 
number of hidden units for the committee tree in the sequel. 

The original message s'^ is split into A^/fT-dimensional K disjoint vectors so that s° can 
be written s° = {s\, . . . , s^s:)- In schemes (I), (II), and (III), fk is a non-monotonic function 
of a real parameter k of the form 



and the vectors x'^ are fixed A^/if- dimensional independent vectors uniformly distributed 
on { — 1, 1}. The use of random input vectors is known to maximize the storage capacity 
of perceptron networks, making such a scheme promising for error correcting tasks. The 
sgn function denotes the sign function taking 1 for x > and — 1 for x < 0. Each of 
these architectures applies a different non-linear transformation to the original data s°. The 
general architecture of these perceptron-based encoders and the non-monotonic function fk 
are displayed in Figure [3l Note that we can also consider an encoder based on a committee- 
tree where both the hidden-units and the output unit are non-monotonic. However, this 
introduces an extra parameter (we will have one threshold parameter for the hidden-units 
and one for the output unit) to tune and the performance should not change drastically. For 
simplicity, we restrict our study to the above three cases. 

To keep the notation as general as possible, as long as explicit use of the encoder is not 
necessary in computations, we will denote the transformation performed on vector s by the 
respective encoders using the following notation: 




1 if Ixl < k 



1 if Ixl > A; 



(10) 




(11) 
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FIG. 3: Left: General architecture of the treehke multilayer perceptrons with N input units and 
K hidden units. Right: The non-monotonic function fk. 

J^k takes a different expression for the five different types of network and k denotes the fact 
that all the encoders depend on a real threshold parameter k (except for schemes (IV) and 
(V), where this function does not depend on k. However for consistency, we will keep this 
notation for these schemes). Furthermore, note that contains all the terms depending on 
index I (i.e.: J-k{{ui}) contains all the terms i^i, . . . , li;, . . . , uk)- 

IV. BINARY ASYMMETRIC CHANNEL (BAC) CAPACITY 

In this section, we compute the capacity of the BAC. According to Shannon's channel 
coding theorem, the optimal code rate is given by the capacity of the channel. Any code 
rate bigger than the channel capacity will inevitably lead to information loss. The definition 
of the channel capacity C is 



where / denotes mutual information, X denotes the channel input distribution, and Y 
denotes the channel output distribution. Computation of the capacity of such a binary 
channel requires only simple algebra and calculations are straightforward, giving 



max 

input probability 



(12) 



Cbac = -f^2(7c) 



1 + 
2 



H2{P) 



1-U 
2~ 




(13) 
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where 



H2{x) = — a;log2(a; 
7c = 



'1 - x) loffofl - x) 



^lii^- P)i^ + ^c) + r(l - ilc)] , 



Ac 



il-r 



1/1—7 p 



pp{i -py-p 

27c — l — r + p 



1 — r — p 

In the special case r — p, the capacity simphfies to 

Cbsc^1-H2{p), 
which corresponds to the capacity of the BSC. 



(14) 
(15) 

(16) 
(17) 

(18) 



V. ANALYTICAL EVALUATION 

As stated in section II, our goal is to maximize the posterior P{s\y). Let us define the 
following Hamiltonian: 



H{y,s) = -ln[P{s\y)P{s)] = -In P{y,s). 



(19) 



The ground state of the above Hamiltonian trivially corresponds to the maximum a posteriori 
(MAP) estimator of the posterior P{s\y). Then, let us compute the joint probability of y 
and s. We have 



P{y, s) ^ P{y\s)P{s). 



(20) 



Since the relation between an arbitrary message s and the codeword fed into the channel is 
deterministic, for any s, we can write 



P{y\s) = P y\Tk 



si ■ 



M 



+ {r-p)] 



(21) 



We finally get the exphcit expression of the Hamiltonian, 
H{y,s) = -lnP(2/,s) 
= -In 



M 




(22) 



Using this Hamiltonian, we can define the following partition function 



Z(/3,2/,a;) = ^exp[-/3?^(2/,s)], 



(23) 



where the sum over s represents the sum over all possible states for vector s, and /3 is 
the inverse temperature parameter. Such a partition function can be identified with the 
partition function of a spin glass system with dynamical variables s and quenched variables 
X. The average of this partition function over y and x naturally contains all the interesting 
typical properties of the scheme, such as the free energy. However, it is hard to evaluate this 
average and we need some techniques to investigate it. In this paper, we use the so-called 
Replica Method to calculate the average of the partition function. Once the free energy is 
obtained, one can compute the critical code rate at which a phase transition occurs between 
the ferromagnetic phase (error recovery possible) and the paramagnetic phase (decoding 
impossible). This gives us the best code rate the scheme can achieve. A code rate exceeding 
this critical value will make decoding impossible. The calculations to obtain the average of 
the partition function {Z{/3, y, x))y ^ are detailed in Appendix A. 

After long calculations, the replica symmetric (RS) free energy is obtained. 



- fRs{q,q,m,m) = extr_ 

q,q,m,m 



E 



K 



X{DRi 



.1=1 



K 



.1=1 



xQ + f[(l-r-p)J-.(W) + (r-p)]) 

J —oo 



-Rmrh — R 



q{l - q) 



(24) 



where 



I{y,Ri,ti,m,q) 



K 



.1=1 



— — [r — p) 
2 2^ ^' 



+|(1 - r-p)Tk ({yr^-z/ + y/q - mHi + mi?,}) 



(25) 
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Dx — , ax. 



27r 



(26) 
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and where extr denotes extremization. The sum denotes the sum other all possible states 
for the variable ?/, that is ±1. 

Note also that we set /3 = 1. This choice of finite temperature decoding (in contrast to 
/3 — 7- oo which corresponds to the zero temperature limit) corresponds to the maximizer of 
posterior marginals (MPM) estimator, while the zero temperature decoding corresponds to 



the MAP estimator 
decoding 



25 



26 



-28 



28(1 . The MPM estimator is known to be optimal for the purpose of 



On top of that, in this paper we suppose that all the channel properties 
(i.e.: the true values of {p,r)) are known to the decoder which implies that the system's 
state we consider is located on the Nishimori line 13 • 

To retrieve the free energy one has to extremize fl24p with respect to the order parameters 



q, q, m, m. This is done by solving the following saddle point equations 



RS 



dq 



^ q 



df. 



RS 



dm 



^ m 



y=±i 



y=±l 



K 



l[DRi 



1=1 



K 



Iq{y,Ri,thm,q) 
I{y,Ri,ti,m,q) 



l + l[il-r-p)J'ki{Ri}) + {r-p)] 



(27) 



K 



l[DRi 



1=1 



K 



1=1 



IL{y^RuU,m,q) 
I{y,Ri,ti,m,q) 



df. 



RS 



dq 

dfRs 
dm 



g = 
m 



2 2 
DU ta.nl? {^/lU + m) 



]- + y-[{l-r-p):F,m}) + {r-p)] 



oo 
oo 



L)f/tanh(v/^f/ + 



where 



rg{y,Ri,ti,m,q) 
lL{y,Rhti,m,q) 



dI{y,Ri,ti,m,q) 
dq 

dI{y,Ri,ti,m,q) 
dm 



(28) 
(29) 
(30) 

(31) 
(32) 



An error correcting code scheme typically admits two solutions: one where m = q = 1, called 
the ferromagnetic solution, and one where m = q = 0, called the paramagnetic solution. As 
the names indicate, these solutions come from the physical ferromagnet state and correspond 
to the case where the spins are all ordered (m = g = 1) or to the case where the spins take 
completely random states (m = g = 0). As we can deduce from equations (IA3I) and ( ]A6I) . the 
ferromagnetic solution corresponds to decoding success since m = 1 implies perfect overlap. 
Conversely, the paramagnetic phase implies failure in the decoding process (overlap m is 0). 
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A. Replica symmetric solution using a parity tree with non-monotonic hidden 
units 



Using a parity tree with non-monotonic hidden units the encoder function becomes 

K 

^fc({«z})=n/'^("')- (33) 

1=1 

Using this encoder function and substituting m = g = in the saddle point equations, 
one can find a consistent solution where q = m = q = rh = 0. This corresponds to the 
paramagnetic solution, where decoding of the received message fails. Using these conditions 
in (^^, one can retrieve the free energy of the paramagnetic phase, 

-fpara = -H^ (^^ [{1 - p) {1 + Qpth) + r{l - Qpth)]^ X In 2, (34) 

where 

^ p+oo 

^PTH = n / DziU{zi). (35) 

1=1 

In the same way, substituting m = g = 1 in the saddle point equations, one can find a 
consistent solution. However, the ferromagnetic solution cannot be computed analytically. 
So we proceed numerically by simply checking the integrand of equations fl27|) and fl28|) . We 
did that extensively for values of = 1, = 2, and if = 3. In each case we found that 
the integrand diverges so that when {q,m) — > (1, 1), we have both g — )■ oo and rh -> oo. 
Substituting g — )• oo and m — )■ oo into and ([30]) clearly yields g = m = l. Sog = m = l, 
g — )■ oo and m — )■ oo is a consistent solution of the saddle point equations which corresponds 
to the ferromagnetic solution, where decoding of the received message succeeds. We also 
checked higher values of K (up to K = 5) and did not find any other consistent solution. We 
conjecture that this result holds for any finite value of K. Finally, substituting m = g = 1, 
m — 7- oo and g — )■ oo into one can get the free energy of the ferromagnetic phase. 



- fferro = [(1 + ^^PT//)i^2(p) + (1 " ^^PTH)i^2(r)] - R\n2. (36) 

Note that when K = 1, the present scheme corresponds to the case of Shinzato et al. [l 
The result we obtained when if = 1 is indeed equivalent to what they found. 
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B. Replica symmetric solution using a committee tree with non-monotonic hidden 
units 



When a committee tree with non-monotonic hidden units 
becomes 

K 



^k{{ui}) = sgn 



1=1 



is used, the encoder function 



(37) 



Using this encoder function and substituting m = g = in the saddle point equations, 
one can find a consistent solution where q = m = q = m = 0. This corresponds to the 
paramagnetic solution, where decoding of the received message fails. Using these conditions 
in (^^, one can retrieve the free energy of the paramagnetic phase. 



para 



-H2 ( ^ [(1 - + ^cth) + r(l - Qcth)] ) X In 2, 



(38) 



where 



CTH 



K 



1=1 



X sgn 



K 



(39) 



In the same way, by substituting m = g = 1 in the saddle point equations one can find a 
consistent solution. However, the ferromagnetic solution cannot be computed analytically, 
so we proceed numerically by simply checking the integrand of equations ( 127|) and (128|) . We 
did that extensively for K = 3 (we consider only odd values of K for this scheme, and 
when K = 1 the present scheme is equivalent to the parity tree case). We found that the 
integrand diverges so that when (g, m) — )■ (1, 1), we have both g — t- 00 and m — t- 00. We also 
checked higher values of K (up to K = 5) and did not find any other consistent solution. We 
conjecture that this result holds for any finite value of K. Finally, substituting m = g = 1, 
m — 7- 00 and g — )■ 00 into i^^, one can get the free energy of the ferromagnetic phase. 

In 2 



ffe 



[(1 + ncTH)H2{p) + (1 - ncTH)H2{r)] - R\n2. 



(40) 
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C. Replica symmetric solution using a committee tree with a non-monotonic out 
put unit 



When a committee tree with a non-monotonic output unit ([7]) is used, the encoder func- 
tion becomes 

K 



^k{.{ui}) = fk 



JJf^sgn(MO 

1=1 



(41) 



Using this encoder function and substituting m = g = in the saddle point equations do 
not imply m = q = and a non-trivial solution is found, which makes the free energy too 
complex to be investigated. This scheme is likely to give non-optimal performance in such 
a case and will not be considered in what follows. 

Note that the limit where K ^ oo was not studied because the saddle point equations 
take a non-trivial form that is difficult to investigate (in the lossy compression case, this 



study is still tractable) 
described in reference 



he techniques to investigate the free energy in the K oo limit 



24| cannot be easily applied here. However, based on the previous 



results of Cousseau et al. 



13|, it is probable that in the K oo limit, the committee tree 



with a non-monotonic output unit saturates the Shannon bound in the general BAG case. 



D. Replica symmetric solution using a parity tree 

Using a parity tree ([H]), the encoder function becomes 

K 

MWi}) = ll^M^i)- (42) 

1=1 

Using this encoder function and substituting m = g = in the saddle point equations, one 
can find a consistent solution where g = m = g = m = but only when K > 1. This 
corresponds to the paramagnetic solution, where decoding of the received message fails. 
Using these conditions in (^^, one can retrieve the free energy of the paramagnetic phase. 



where 



- fpara = "^2 ( ^ [(1 - p){l + ^pt) + r(l - Qpr)] ) X ln2, (43) 



^PT = Yl Dzi X sgn{zi). (44) 
1=1 
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When = 1 is considered, m = q = does not imply m = g = and a non-trivial solution 
is found that makes the free energy too complex to be investigated. The scheme is likely to 
give non-optimal performance in such a case and will not be considered in what follows. 

In the same way, substituting m = g = 1 in the saddle point equations, one can find 
a consistent solution, but only when K > 1. However, the ferromagnetic solution cannot 
be computed analytically, so we proceed numerically by simply checking the integrand of 
equations fl271) and fl28|) . We did that extensively for values of K = 2 and K = 3. In 
each case, we found that the integrand diverges so that when {q,m) — )■ (1, 1) we have both 
g — > oo and m — )■ oo. We also checked higher values of K (up to K = 5) and did not find 
any other consistent solution. We conjecture that this result holds for any finite value of 
K > 1. Finally, substituting m = g = l,m— )-oo and g — )■ oo into i^^, one can get the free 
energy of the ferromagnetic phase, 

- fferro = [(1 + ^pt)H2{p) + (1 - fipT)^2(r)] - R\n2. (45) 



E. Replica symmetric solution using a committee tree 



Using a committee tree 



the encoder function becomes 

K 



^k{{ui}) = sgn 



W^5^sgn(n0 

1=1 



(46) 



Using this encoder function and substituting m = g = in the saddle point equations do 
not imply m = g = and a non-trivial solution is found that makes the free energy too 
complex to be investigated. This scheme is likely to give non-optimal performance in such 



12|, the 



a case and will not be considered in what follows. As in the lossy compression case 
committee tree is unable to yield Shannon optimal performance. 

Note that the limit where K ^ oo was not studied because the saddle point equations 
take a non-trivial form that is difficult to investigate (in the lossy compression case, this 



study is still tractable). 



described in reference 24 
results of Mimura et al. 



he techniques to investigate the free energy in the K ^ oo limit 
cannot be easily applied here. However, based on the previous 



12|, it is probable that in the K ^ oo limit the committee tree 



still fails to saturate the Shannon bound even in the BSC case. 
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VI. PHASE TRANSITION 



For the parity and committee tree with non-monotonic hidden units and for the standard 
parity tree, we found a paramagnetic and a ferromagnetic solution of the following form: 

-fpara = -H^ (^^[{1 - p){l + n) + r{l - n)]^ X \n2, (47) 

-fferro = [(1 + l^)^2(p) + (1 " ^)H2{r)] - R\n2, (48) 

where fl is given by ^pth, ^cth, or Qpx depending on the encoder considered. 

It then beconmes possible to calculate the critical value of the code rate R at which a 
sharp phase transition occurs between the ferromagnetic and the paramagnetic phase. This 
indicates the boundary between possible decoding (ferromagnetic phase) and impossible 
decoding (paramagnetic phase). In other words, this enables us to calculate the optimal 
code rate for each scheme. At the phase transition point, we have 

fpara fferro- (^'^) 

Simple algebra leads to 

R = H,{^) - ^H,ip) - ^^^2(r), (50) 

where 

7 = ^[(l-p)(l + l^) + r(l-fi)] (51) 

and where Q is given by the encoder considered {Qpth, ^cth, or Qpr)- This equation has 
exactly the same form as the BAG capacity equation ( JT3l) and in fact is equivalent to the 
BAG capacity if and only if = Qc- Since Q depends on the encoder, we will treat each 
case in the following subsections. 

A. Tuning of the parity tree with non-monotonic hidden units 

In the parity tree with non-monotonic hidden units case, we have 

n = npTH = n / Dzifkizi). (52) 

7 1 J — OO 



1=1 
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The parity tree with non-monotonic hidden units is optimal if and only if 

npTH = ^c^ H{k) = \{l- , (53) 

where 

r+oo 

H{x) = / Dz. (54) 

J X 

This gives us a condition on the threshold parameter k of the non-monotonic transfer func- 
tion /fc. If the threshold k is tuned to satisfy f l53|) . the scheme achieves the Shannon limit. 
The only remaining issue is whether such an optimal threshold k exists. 

We solved ( l53l) numerically with parameters {p, r) G {]0, and always found an optimal 
threshold parameter up to = 11. Note that Qc can be negative, which causes problems 
for the i^— th root when considering an even number of hidden units K. However a simple 
permutation of the probability p and r changes the sign of Qc- Since the original messages 
are drawn from the uniform distribution, this permutation can be done without any loss of 
generality. Instead of using Sq, one uses — Sq- We did not check higher values of K, but we 
conjecture that the same result holds. This means that the parity tree with non-monotonic 
hidden units saturates the Shannon bound in the large codeword length limit for any number 
of hidden units K. 

B. Tuning of the committee tree with non-monotonic hidden units 

In the committee tree with non monotonic hidden units case, we have 

K 



Q = VLcth 



WDz, 



.1=1 



X sgn 



■ K 



1=1 



(55) 



The committee tree with non-monotonic hidden units is optimal if and only if 



K-l 

1=0 ^ ^ 

-[2Hik)f~'[l-2H{k)f), (56) 

where (^) denotes the binomial coefficient. This gives us a condition on the threshold 
parameter k of the non-monotonic transfer function f^. If the threshold k is tuned to satisfy 
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(156|) . the scheme achieves the Shannon hmit. Thus, we should check if such an optimal 
threshold k exists. 

We solved f l56p numerically with parameters {p, r) G {]0, and always found an optimal 
threshold parameter up to = 11. We did not check higher values of K, but we conjecture 
that the same result holds. Note that as mentioned in the definition of this encoder, we 
considered only an odd number of hidden units K. Therefore, these results mean that the 
committee tree with non-monotonic hidden units saturates the Shannon bound in the large 
codeword length limit for any odd number of hidden units K. 

C. Tuning of the parity tree 

In the parity tree case, we have 

^ p+oo 

n = npT = ll Dzix sgn{zi). (57) 
1=1 

The parity tree is optimal if and only if 

= nc = 0. (58) 

This gives us a strong condition on Qc- From the definition (IT7|) . it can be easily seen 
that Qc = if and only if r = p: that is when the BAG channel turns into the particular 
case of the BSC channel. This means that the standard monotonic parity tree saturates 
the Shannon bound in the large codeword length limit, but only in the BSC case and for a 
number of hidden units K > 1. This confirms what we expected and is the equivalent of 



Mimura et al. 



12| lossy compression case. 



VII. CONCLUSION AND DISCUSSION 



We investigated an error correcting code scheme for uniformly unbiased Boolean messages 
using parity tree and committee tree multilayer perceptrons. All the schemes which use the 
non-monotonic transfer function in their hidden layer were shown to saturate the Shannon 
bound under some specific conditions. The use of fk enables the relevant schemes to deal 
with asymmetric channels like the BAC while monotonic networks using only the standard 
sign function can deal only with symmetric channels like the BSC. 

18 



Indeed, we confirmed that the standard monotonic parity tree saturates the Shannon 
bound only in the case of the BSC channel. The standard monotonic committee tree however, 
fails to provide optimal performance even in the BSC case. 

As a general conclusion, this paper shows that tree-like multilayer perceptrons introduced 
in Q, [3] within the framework of lossy compression can also be used efficiently in 
an error correcting code scheme. For each network considered, we provided a theoretical 
analysis of the typical performance and gave the necessary conditions for obtaining optimal 
performance. In each case, we were able to derive results similar to the lossy compression 
results. Finally, in the case of error correcting code, the replica symmetric solution stability 



18| was not checked because no replica symmetry breaking is expected on the Nishimori 



line 



29|. 



This paper discusses only the typical performance of the schemes at the infinite codeword 
length, however, and does not provide any explicit decoder. Because the present schemes 
make use of densely connected systems, a formal decoder cannot be implemented as it would 
require a decoding time which would grow exponentially with the size of the original mes- 
sage. One promising alternative is to use the popular belief propagation (BP) algorithm to 
calculate an approximation of the marginalized posterior probabilities. The BP algorithm is 
known for giving good results when working in the ferromagnetic phase, where no frustration 
is present into the system. 



With the previous work done on lossy compression [l0,ll2|, |l3|, |30| and on error correcting 

[ ! 

code ll7'i\ using perceptron type networks, there is now a sufficient theoretical background 
to investigate and compare the practical performance (in the finite codeword length limit) 
of all the schemes with the theoretical performance. In the case of lossy compression with 
a simple perceptron, the study of the BP algorithm performance has already been done by 



Hosaka et al. 



30l |. Their work provides a solid base from which to begin investigating the 



more complicated multilayer structure. The infiuence of the number of hidden units on the 
practical performance of the scheme is an interesting issue which will be examined in future 
work. 
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Appendix A: Analytical Evaluation using the replica method 



The free energy can be evaluated by the replica method. 



1 {Z{(3,y,xr) 
lim 



y,x 



- 1 



f3N n^o n 
where Z{(5^ y, as)" denotes the n-times replicated partition function 

n 



(Al) 



(A2) 



Vector is given by s° = (s^, . . . , s^) and superscript a denotes the replica index. 

We proceed to the calculation of the replicated partition function ( 1A2I) . Inserting the 
following two identities, 

n K 



nn 

a=l 1=1 

^ \ TlK 

27n 



dmt6 [si St 



N 
— ? 
K 



-m, 



and 



X exp 



n K 

nn 

a<b 1=1 



2m 



X exp 



a I 



(A3) 



dqrS [st-s',- -qf 



n{n-l)K/2 



va<6 I 



EE' 

.a<b I 



N 



'st-s',- -qr 



(A4) 
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into (lA2p enables us to separate the relevant order parameters, and to calculate the average 
moment {Z{(], y, x)"')y ^ for natural numbers n as, 



a I 



ah 



2Tii 



X exp < N 



X l^ + |l(l-'--P)-^*(W}) + ('--p)l 



n 

a 

n 



exp 



exp 



/31n(i + |[(l-r-p)J-,(K}) + (r-p)] 



+ iRiWi + ivi ■ ui 



+ 



exp 



a<b,l 



(A5) 



where is an tt, x tt, matrix having elements {q^^} and where Aii is an n dimensional vector 
having elements {mf}. We analyze the scheme at the thermodynamic limit N,M +00 
while the code rate R is kept finite. In this limit, flASP can be evaluated using the saddle 



point method with respect to m°, m", g"'', g"^ so that the free energy can be retrieved. To 
continue the calculation, we have to make some assumptions about the structure of these 
order parameters. In this paper, we use the so-called replica symmetric (RS) ansatz. 



m. 



m. 



m. 



ab 



Zab 



(A6) 



;i - q)5ab + q, 
H - qr = (1 - ^)^ab + g, 

where 5ab denotes the Kronecker delta. This ansatz means that all the hidden units are 
equivalent after averaging over the disorder. 

Also note that by definition, order parameter m is equivalent to quantity which 
gives the overlap between the decoded message s and the original message An overlap 
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of 1 indicates perfect decoding while an overlap of denotes complete failure. 
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